copula model
- Leisure & Entertainment > Games > Computer Games (0.93)
- Information Technology (0.68)
Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification
Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur
Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.
- North America > United States > New York (0.04)
- North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
- North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.83)
Cross-Modal Alignment via Variational Copula Modelling
Wu, Feng, Chan, Tsai Hor, Wang, Fuying, Yin, Guosheng, Yu, Lequan
Various data modalities are common in real-world applications (e.g., electronic health records, medical images and clinical notes in healthcare). It is essential to develop multimodal learning methods to aggregate various information from multiple modalities. The main challenge is how to appropriately align and fuse the representations of different modalities into a joint distribution. Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities and indicating a need to model more complex interactions. Additionally, the joint distribution of latent representations with higher-order interactions is underexplored. Copula is a powerful statistical structure for modelling the interactions among variables, as it naturally bridges the joint distribution and marginal distributions of multiple variables. We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interactions among them. The key idea is to interpret the copula model as a tool to align the marginal distributions of the modalities efficiently. By assuming a Gaussian mixture distribution for each modality and a copula model on the joint distribution, our model can generate accurate representations for missing modalities. Extensive experiments on public MIMIC datasets demonstrate the superior performance of our model over other competitors. The code is available at https://github.com/HKU-MedAI/CMCM.
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- North America > Canada (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
Huk, David, Damoulas, Theodoros
Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains. Given a collection of d continuous random variables, a simple model for their joint probability density function is the product of the corresponding d univariate densities (Peterson, 1987). Indeed, the copula uniquely and exactly represents the inter-variable dependence, unlike correlation or mutual information (Geenens, 2023), fully disentangling the marginal behaviour from the joint. This disentanglement enables a modular approach for multivariate modelling: first, model the uni-variate variables independently, and second, model their dependence with a copula.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science > Data Mining (0.92)
- Leisure & Entertainment > Games > Computer Games (0.93)
- Information Technology (0.68)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Vector Copula Variational Inference and Dependent Block Posterior Approximations
Fu, Yu, Smith, Michael Stanley, Panagiotelis, Anastasios
Variational inference (VI) is a popular method to estimate statistical and econometric models. The key to VI is the selection of a tractable density to approximate the Bayesian posterior. For large and complex models a common choice is to assume independence between multivariate blocks in a partition of the parameter space. While this simplifies the problem it can reduce accuracy. This paper proposes using vector copulas to capture dependence between the blocks parsimoniously. Tailored multivariate marginals are constructed using learnable cyclically monotone transformations. We call the resulting joint distribution a ``dependent block posterior'' approximation. Vector copula models are suggested that make tractable and flexible variational approximations. They allow for differing marginals, numbers of blocks, block sizes and forms of between block dependence. They also allow for solution of the variational optimization using fast and efficient stochastic gradient methods. The efficacy and versatility of the approach is demonstrated using four different statistical models and 16 datasets which have posteriors that are challenging to approximate. In all cases, our method produces more accurate posterior approximations than benchmark VI methods that either assume block independence or factor-based dependence, at limited additional computational cost.
- Europe > Albania > Tirana County (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (7 more...)